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Abstract Tree representations of (sets of) symmetric binary relations, or 
equivalently edge-colored undirected graphs, are of central interest, e.g. in phy- 
logenomics. In this context symbolic ultrametrics play a crucial role. Symbolic 
ultrametrics define an edge-colored complete graph that allows to represent 
the topology of this graph as a vertex-colored tree. Here, we are interested 
in the structure and the complexity of certain combinatorial problems result¬ 
ing from considerations based on symbolic ultrametrics, and on algorithms to 
solve them. 

This includes, the characterization of symbolic ultrametrics that addi¬ 
tionally distinguishes between edges and non-edges of arbitrary edge-colored 
graphs G and thus, yielding a tree representation of G, by means of so-called 
cographs. Moreover, we address the problem of hnding “closest” symbolic ul¬ 
trametrics and show the NP-completeness of the three problems: symbolic 
ultrametric editing, completion and deletion. Finally, as not all graphs are 
cographs, and hence, don’t have a tree representation, we ask, furthermore, 
what is the minimum number of cotrees needed to represent the topology of 
an arbitrary non-cograph G. This is equivalent to find an optimal cograph 
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edge fc-decomposition {Ei, ..., Ek} of E so that each subgraph {V, Ei) of G is 
a cograph. We investigate this problem in full detail, resulting in several new 
open problems, and NP-hardness results. 

For all optimization problems proven to be NP-hard we will provide integer 
linear program (ILP) formulations to efficiently solve them. 

Keywords Symbolic Ultrametric • Cograph • Edge Partition • Editing • 
Integer Linear Program (ILP) • NP-complete 


1 Introduction 

Tree representations of relations between certain objects lie at the heart of 
many problems, in particular, in phylogenomic studies [lainiiio]- Phylogenetic 
Reconstructions are concerned with the study of the evolutionary history of 
groups of systematic biological units, e.g. genes or species. The objective is 
the assembling of so-called phylogenetic trees or networks that represent a 
hypothesis about the evolutionary ancestry of a set of genes, species or other 
taxa. 

Consider a symmetric map 5: V x V M® that assigns to each pair 
(a;, 2 /) a symbol or color m € M®. The question then arises whether it is 
possible to determine a rooted tree T with a vertex-labeling t so that the 
lowest common ancestor lca(a;, y) of distinct leaves x and y in T is labeled 
with m € M® if and only if S{x,y) = m. Such a tree is then called symbolic 
representation of 5. In phylogenomics, such maps 5 can be interpreted as an 
assignment of evolutionary relationships between two genes and the symbolic 
representation of such relations then reflect the evolutionary history together 
with the respective events that happened when two genes diverged. It has 
recently be shown, that in theory |19j and in practice m it is even possible 
to reconstruct the evolutionary history of species, where the genes have been 
taken from, whenever the symbolic representation is known. 

The problem of finding such symbolic representations (T, t) has been first 
addressed by Bbcker and Dress [1] in a mathematical context. The authors 
showed, that there is a symbolic representation (T, t) of 6 if and only if the 
map 5 fulfills the properties of a so-called symbolic ultrametric [1]. Clearly, 
any such map 5: V x V M® is equivalent to a set of disjoint symmetric 
binary relations {Rm \ m G M®} with (a:, y) G Rs{x,y) or an edge-colored 
complete graph = (R, (^)) so that each edge [x,y] obtains the color 

S{x,y) G M®. In [15] a characterization of symbolic ultrametrics in graph 
theoretical terms have been given. It has been shown that there is a symbolic 
representation (T,t) of such an edge-colored graph iL|y| if the edges of each 
cycle of length 3 have at most two colors and each mono-chromatic subgraph 
Gra, i-e., subgraphs that consist of all the edges having a fixed color m, are so- 
called cographs. Cographs are characterized by the absence of induced paths 
P 4 on four vertices, although there are a number of equivalent characterizations 
of cographs (see e.g. [5] for a survey). Moreover, Lerchs showed that 
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each cograph G = {V,E) is associated with a unique rooted tree T{G), called 
cotree. 

In this contribution we address several combinatorial problems that are 
concerned with symbolic ultrametrics and tree representations of arbitrary, 
possibly edge-colored graphs. 

We first investigate in Section^ under what conditions it is possible to find 
a symbolic ultrametric for arbitrary graphs G so that edges and non-edges of G 
can be distinguished. In other words, we ask for an edge-coloring of G so that 
edges and non-edges always obtain different colors and this coloring satisfies 
the conditions of a symbolic ultrametric. If such a coloring is known for G, 
then one can immediately display the topology of G as a tree via a symbolic 
representation {T,t). It does not come as a big surprise, when we prove that 
such a symbolic ultrametric can only be defined for G if and only if G is 
already a cograph. This, in particular, establishes another new characterization 
of cographs. As a consequence we can infer that any symbolic representation 
(T, t) of a cograph G is a so-called refinement of its cotree. 

In practice, however, symmetric maps d: V x V —> M® represent often 
only estimates of the true relationship S between the investigated objects, 
e.g., genes EBUa. Thus, in general such estimates d will not be a symbolic ul¬ 
trametric. Hence, there is a great interest in optimally editing d to a symbolic 
ultrametric (5, i.e., finding a minimum number of changes of the assignment 
d{x, y) G M® to pairs (a:, y) so that there is a symbolic representation of the 
resulting map sm So-far, the complexity of this problem has been unknown. 
In Section we show that (the decision version of) this problem, called Sym¬ 
bolic Ultrametric Editing, is NP-complete. Additionally, we show that 
the problems Symbolic Ultrametric Completion and Symbolic Ul¬ 
trametric Deletion are NP-complete and provide integer linear program 
(ILP) formulations in order to efficiently solve the latter three problems in 
future work. 

A further combinatorial problem we consider in Section 0 is motivated by 
the results established in Section [3] where we have characterized graphs for 
which one can find symbolic representations by means of cographs. However, 
not all graphs are cographs and thus, don’t have such a tree representation. 
Therefore, we ask for the minimum number of cotrees that are needed to 
represent the structure of a given graph G = (U, E) in an unambiguous way. 
As it will turn out, this problem is equivalent to find a decomposition 77 = 
{El, ..., Ek} of E (i.e., the elements of 77 need not necessarily be disjoint) for 
the least integer k, so that each subgraph G^ = (U, 77^), I < 7 < /c is a cograph. 
Such a decomposition is called cograph edge fc-decomposition, or cograph k- 
decomposition, for short. If the elements of 77 are in addition pairwise disjoint, 
we call 77 a cograph 7-partition. We show that the number of such optimal 
cograph fc-decomposition, resp., partitions on a graph can grow exponentially 
in the number of vertices. Moreover, non-trivial upper bounds for the integer 
k such that there is a cograph 7-decomposition, resp., partition are derived 
and polynomial-time algorithms to compute 77 with |77| < Z\ -|- I, where A 
denotes the maximum number of edges a vertex is contained in, are provided. 
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Furthermore, we will prove that finding the least integer k > 2 so that G has 
a cograph fc-decomposition or a cograph fc-partition is an NP-hard problem. 
In order to attack this problem in future work, we derive ILP formulations to 
solve this problem efficiently. These findings complement results known about 
so-called cograph vertex partitions [nmiiniisa- 


2 Essential Definitions 

Graph. In what follows, we consider undirected simple graphs G = (V, E) 
with vertex set V{G) = V and edge set E(G) = E C The complement 
graph = (V, E‘^) of G = (V.E), has edge set E^ = \ E. The graph 

K\y\ = {V,E) with E = is called complete graph. A graph H = {W,F) is 
an induced subgraph of G = (P, E), if IT CP and all edges [x,y\ C E with 
x,y &W are contained in E. The degree deg(z;) = \{e G E \ v G e}\ oi a vertex 
u G P is defined as the number of edges that contain v. The maximum degree 
of a graph is denoted with A. 

Rooted Tree. A connected graph T is a tree, if T does not contain cycles. A 
vertex of a tree T of degree one is called a leaf of T and all other vertices of 
T are called inner vertices. The set of inner vertices of T is denoted by P° 
and with we denote the set of inner edges, that are the edges in E where 
both of its end vertices are inner vertices. A rooted tree T = (P, E) is a tree 
that contains a distinguished vertex px G V called the root. The first inner 
vertex lcaT(a;, y) that lies on both unique paths from distinct leaves x, resp., y 
to the root, is called lowest common ancestor of x and y. If there is no danger 
of ambiguity, we will write lca(a;,y) rather then Icaxix^y). 

Symbolic Ultrametric and Symbolic Representation. In what follows, the set M 
will always denote a non-empty finite set, the symbol © will always denote a 
special element not contained in M, and M® := M U {©}. Now, suppose X is 
an arbitrary non-empty set and d : A x A —> M® a map. We call 5 a symbolic 
ultrametric if it satisfies the following conditions: 

(UO) 6{x, y) = 0 if and only if a; = y; 

(Ul) 6{x,y) = S{y,x) for all x,y G X, i.e. S is symmetric; 

(U2) |{i5(a:, y), S{x, z), S{y, z)}| < 2 for all x,y,z G A; and 
(U3) there exists no subset {x,y,u,v} G (^) such that 6{x,y) = S{y,u) = 
S{u,v) ^ 5{y,v) = S{x,v) = S{x,u). 

Now, suppose that T = (P, E) is a rooted tree with leaf set A and that 
t : P —>■ M® is a map such that t{x) = 0 for all a; G A. To the pair (T, f) we 
associate the map d(^T,t) on A x A by setting, for all x,y G X, 

d{T,t) : A X A ^ M®;d^T,t){x,y) = t{lcaT{x,y)). 

We call the pair (T, t) a symbolic representation of a map d : A x A —>■ M®, 
if S{x,y) = d{T,t){x,y) holds for all x,y G X. For a subset IP C A x A we 
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denote with 5{W) = {m G M® | 3x,y G W s.t S{x,y) = m} the set of images 
of the elements contained in W. 

We say that {T,t) and are isomorphic if T and T' are isomorphic 

via a map ip : V{T) —>• V(T') such that t'{p{v)) = t{v) holds for all v G V{T). 

Cographs and Cotrees. Complement-reducible graph, cographs for short, are 
defined as the class of graphs formed from a single vertex under the closure of 
the operations of union and complementation, namely: (i) a single-vertex graph 
is a cograph; (ii) the disjoint union of cographs is a cograph; (iii) the comple¬ 
ment of a cograph is a cograph. Alternatively, a cograph can be defined as a 
P 4 -free graph (i.e. a graph such that no four vertices induce a subgraph that is 
a path of length 3). A number of equivalent characterizations of cographs can 
be found in [Q. It is well-known in the literature concerning cographs that, to 
any cograph G = {V, E'), one can associate a canonical cotree T{G) = {V, E). 
This is a rooted tree, leaf set I® \ equal to the vertex set V of G and inner 
vertices that represent so-called ’’join” and ’’union” operations together with 
a labeling map t :V^ ^ {0,1} such [x,y\ G E' if and only if t(lca(a;, y)) = 1, 
and t{v) 7 ^ t{wi) for all n G and all children wi,... ,Wk G of n, (cf. [5]). 
We will call the pair (T, t) cotree representation of G. 

Cograph k-Decomposition and Partition, and Cotree Representation. Let G = 
{V,E) be an arbitrary graph. A decomposition II = {Ei,... Ek} of if is a 
called (cograph) k-decomposition, if each subgraph Gt = {V,Ei), 1 < i < k 
of G is a cograph. We call II a (cograph) k-partition if Ei Ci Ej =0, for all 
distinct i,j G {1,..., k}. A fc-decomposition II is called optimal, if II has the 
least number k of elements among all cograph decompositions of G. Clearly, for 
a cograph only fc-decompositions with k = 1 are optimal. A fc-decomposition 
n = {El,... Ek} is coarsest, if no elements of il can be unified, so that the 
resulting decomposition is a cograph /-decomposition, with / < fc. In other 
words, n is coarsest, if for all subsets / C {I,...,/c} with |/| > I it holds 
that (V, Ui^jEi) is not a cograph. Thus, every optimal fc-decomposition is also 
always a coarsest one. 

A graph G = {V,E) is represented by a set of cotrees T = {Ti,...,rfc}, 
each Ti with leaf set V, if and only if for each edge \x,y\ G E there is a tree 
Ti G T with t{lca,Ti{x,y)) = I. 

The Cartesian (Graph) Product GDH has vertex set V{GDII) = V{G) x 
V{H); two vertices {gi,hi), {g 2 ,h 2 ) are adjacent in GDH if [ 31 , 32 ] G E{G) 
and hi = ^ 2 , or [hi,h 2 ] G E{II) and 31 = 32 . It is well-known that the 
Cartesian product is associative, commutative and that the single vertex graph 
Ki serves as unit element WM- Thus, the product of arbitrary many 

factors Gi,..., G„ is well-dehned. For a given product df^iGi, we define the 
Gi-layer Gf of G (through vertex w that has coordinates {wi,..., w„)) as the 
induced subgraph with vertex set V{Gf) = {v = {vi,..., Vn) G x(L^I/(Gi) | 
Vj = Wj, for all j 7 ^ i}. Note, G“ is isomorphic to Gi for all 1 < i < n, 
w G V{GI(^iGi). The n-dimensional hypercube Qn or n-cube, for short, is the 
Cartesian product □(k^Ar 2 - 
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3 Symbolic Ultrametrics and Cographs 

Symbolic ultrametrics and respective representations as event-labeled trees, 
have been first characterized by Bbcker and Dress [3]. 

Theorem 1 (Him) Suppose (5 : U X U —> M® is a map. Then there is a 
symbolic representation of 6 if and only if 6 is a symbolic ultrametric. Further¬ 
more, this representation can be computed in polynomial time. 

Now, let (5 : U X U ^ M® be a map satisfying Properties (UO) and (Ul). 
Clearly, the map S can be considered as an edge coloring of a complete graph 
K\y\, where each edge [x,y] obtains color d{x,y). For each fixed m S M, we 
define the undirected graph Gm ■= Gm{^) = {V, Em) with edge set 

Em = {[x,y]\5{x,y) = m, x,y &V} . ( 1 ) 

Hence, Gm denotes the subgraph of the edge-colored graph K\y\, that contains 
all edges colored with m G M. The following result establishes the connection 
between symbolic ultrametrics and cographs. 

Theorem 2 ([15]) Let 5 :V y.V ^ M® be a map satisfying Properties (UO) 
and (Ul). Then S is a symbolic ultrametric if and only if 

(U2’) For all {x, y, z} G (1^) there is an m & M such that Em contains (at least) 
two of the three edges {x,y], [x,z\, and [y,z\. In other words, for each 
triangle induced by x, y and z, the edges have at most 2 different colors 
(US’) Gm is a cograph for all m G M. 

Assume now, we have given an arbitrary none edge-colored graph G = 
{V, E) and we want to represent the topology of G as a tree. The following 
question then arises: 

Under which conditions is it possible to define an coloring on the edges 
and non-edges of G, so that edges e € E obtain a different color then the non¬ 
edges e G E'^ of G and, in particular, so that the resulting map d is a symbolic 
ultrametric? 

In other words, we ask for an edge-coloring of G so that there is a tree (T, t) 
with t{\ca.Tix, y)) = m if and only if the (non)edge [x, y\ obtained color m and 
that edges and non-edges of G can be distinguished by this coloring, that is, 
edges and non-edges never obtain the same color. For an example of such an 
edge-colored graph G, see Figure[TJ The following theorem gives necessary and 
sufficient conditions on the structure of graphs G for which one can find such 
a coloring and, in addition, provides a new characterization of cographs. 

Theorem 3 Let G = {V, E) be an arbitrary (possibly disconnected) graph, 
W = {{x, y) & V X V \ [x,y\ & E} and = {{x, y) € V xV\ [x,y] ^ E}. 

There is a symbolic ultrametric 5 :V xV ^ M® s.t. 5{W) r\5{W'’’) = 9 if 
and only if G is a cograph. 



Tree Representations of Graphs 


7 


LABEL 

A 

B 

C 

D 


COLOR 

non-edges 







Fig. 1 Shown is a disconnected edge-colored graph G in the lower left part. The edge-colors 
are identified with the labels S, C and D, as indicated in the upper left part. Non-edges 
are identified with the label A. It is easy to verify that the event-labeled tree on the right 
hand side (T, t) is a tree representation of G, since for all distinct leaves i and j we have 
lcaT{i,j) = X £ {A, B, G, D} if and only if the (non-)edge [i,j] has the color identified with 
the respective label X. 

In particular, G is a cograph and its cotree representation can be obtained by replacing the 
label A by 0, all other labels B, G,D hy 1 and additional contraction of the interior edges 
[G,B] and [C,D]. 


Proof First assume that G is a cograph. Set (5(a;, x) = Q for all x G V and set 
6 {x,y) = 6{y,x) = 1 if [x,y\ G E and, otherwise, to 0. Hence, condition (C/0) 
and (C/1) are fulfilled. Moreover, by construction \M\ = 2 and thus. Condi¬ 
tion (C/2') is trivially fulfilled. Furthermore, since Gi(5) and its complement 
Gq{5) are cographs, (C/3') is satisfied. Theorem [5] implies that <5 is a symbolic 
ultrametric. 

Now, let (5 : Vx V —>■ M® be a symbolic ultrametric with S{W)r\S{W‘^) = 0. 
Assume for contradiction that G is not a cograph. Then G contains an induced 
path Pi = a — b — c — d. Therefore, at least one edge e of this P 4 must obtain a 
color S{e) different from the other two edges contained in this P4, as otherwise 
Gs{e) {S) is not a cograph and thus, 6 is not a symbolic ultrametric (Thm. [21 
(C/3') ). For all such possible maps S “subdividing” this P4 we always obtain 
that two edges of at least one of the underlying paths P 3 = a — b — c or 
b — c — d must have different colors. W.l.o.g. assume that S(a, b) ^ S{b, c). Since 
[a,c] ^ E and 5{W) fl 5{W^) = 0 we can conclude that 6{a,c) 7 / S{a,b) and 
S{a,c) 7 / 6{b,c). But then Condition (C/2') cannot be satisfied, and Theorem 
[ 2 ] implies that <5 is not a symbolic ultrametric. □ 

Theorem [3] implies, that there is no hope for finding an edge-distinguishing 
map S for a graph G, that assigns symbols or colors to edges, resp., non-edges 
such that for S (and hence, for G) there is a symbolic representation (T, t ), 
unless G is already a cograph. However, this result does not come as a big 
surprise, as a cograph G is characterized by the existence of a unique (up to 
isomorphism) cotree (T', t') representing the topology of G. As a consequence 
of this result we can infer that any symbolic representation (T, t) of a cograph 
G is a refinement of the cotree representation of G, that is, the cotree 

representation (T',t') of G can be obtained from the symbolic representation 
(T, t) of S by the following procedure: 









Marc Hellmuth, Nicolas Wieseke 


First reset for each v gV, 


t{v) = < 


© 

1 


0 

\ 


if V gV \ V°, i.e, V is a leaf 

if V = Ic&Tix^y) and S{x,y) G VF, i.e, [x^y] is an edge in G 
if else, i.e, [x, y] is not an edge in G 


Clearly, this new map t on the tree T defines a symbolic representation (T, t) 
of the cograph G = {V,E) so that [x,y] G E and only if t{\caT{x,y)) = 1. 
However, it might be possible that there is an edge e = [M,n] G E^{T) such 
that t{u) = t{v), and therefore, (T, t) is not a cotree representation. In this 
case, identify a new vertex Ve with e and define the tree = (Ve,Ee) with 
vertex set Vg = V{T) \ {u,v} U {vg}, edge set Eg = E{T) \ {e} U : 

[w, u] or [w, ?;] G i?}, that is again a rooted tree. Define for all w GVg the map 


tg{w) = t{w) ii w Vg and t{vg) = t{u). 


( 2 ) 


This construction can be repeated, with (Tg,tg) now playing the role of (T,t), 
until a we end in a rooted tree T = (fo, E) with a map t : fo —>■ M® so that 
for all edges [u,v] G E^ it holds that t{u) ^ t{v). 


With this procedure, we obtain a symbolic representation (T, t) of the co¬ 
graph G, also known as so-called discriminating symbolic ultrametric [15j . 
In particular, this representation (T, t) is unique (up to isomorphism) [151 cf. 
Prop. 1] and, by construction, satisfies the condition of a cotree representation. 
Moreover, since the cotree representation (T', t') is unique (up to isomorphism) 
miMi, it follows that that and {T,t) must be isomorphic. We summa¬ 

rize this result in the following corollary. 

Corollary 1 Let G = {V, E) be a cograph, (T',t') be the corresponding eotree 
representation, and W, resp., as defined in Theorem\^ Moreover, assume 
that there is a symbolic ultrametric 5 '.V xV ^ M® s.t. 5{W) fl diyV^) = 0 
with {T,t) being the eorresponding symbolic representation of 5. 

Assume that the pair {T,t) is obtained from {T,t) by application of the 
proeedure above. Then, {T,f) and {T',t') are isomorphie. 

Assume that we want to find a symbolic ultrametric that can distinguish 
between “most of” the edges and/or non-edges, however, the given graph is 
a non-cograph G = {V,E). Then, we are immediately left with the following 
problems. 

Problem Cograph Editing/Deletion/Completion 

Input: Given a simple graph G = {V,E) and an integer k. 

Question: Is there a cograph G' = {V, E'), s.t. 

E' C (^) and \EAE'\ < k (Editing), 

E' G E and \E\ E'\ < k (Deletion), or 
E G_ E' and \E' \ i?| < fc (Completion). 
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However, the (decision version of the) problem to edit a given graph G 
into a cograph G", and thus, to find the closest graph G' that has a symbolic 
representation, is NP-complete [25ll26] . In addition, the problems of deciding 
whether there is a cograph G' resulting by adding, resp., removing k edges 
from G is NP-complete, as well El- 

Theorem 4 (Liu et al. [26], El-Mallah and Colbourn [Tl|) Co- 

GRAPH Editing, Cograph Completion and Cograph Deletion are NP- 
complete. 

In what follows, we will consider and discuss two modifications of the prob¬ 
lem of finding a symbolic ultrametric that can distinguish between edges and 
non-edges in Section 0] and [5] 

1. In Section 0] we consider a couple of problems which are of highly practi¬ 
cal relevance: The symbolic ultrametric editing, completion and deletion 
problem. 

2. In contrast, if a non-edge colored graph G is not a cograph and thus, if 
there is no single tree representation of G, then we ask for the minimum 
number of trees that are needed in order to represent the topology of G in 
an unambiguous way, see Section [S] 


4 Symbolic Ultrametric Editing, Completion and Deletion 

Symbolic ultrametrics lie at the heart of many problems in phylogenomics. 
Phylogenetic Reconstructions are concerned with the study of the evolutionary 
history of groups of systematic biological units, e.g. genes or species. The 
objective is the assembling of so-called phylogenetic trees or networks that 
represent a hypothesis about the evolutionary ancestry of a set of genes, species 
or other taxa. 

Genes are passed from generation to generation to the offspring. Some of 
those genes are frequently duplicated, mutate or get lost - a mechanism that 
also ensures that new species can evolve. Crucial for the evolutionary recon¬ 
struction of species history is the knowledge of the relationship between the 
respective genes. Genes that share a common origin (homologs) are divided 
into three classes, namely orthologs, paralogs, and xenologs El- Two homolo¬ 
gous genes are orthologous if at their most recent point of origin the ancestral 
gene complement is transmitted to two daughter lineages; a speciation event 
happened. They are paralogous if the ancestor gene at their most recent point 
of origin was duplicated within a single ancestral genome; a duplication event 
happened. Horizontal gene transfer (HGT) refers to the transfer of genes be¬ 
tween organisms in a manner other than traditional reproduction and across 
different species; if such an event happened at the most recent point of origin 
of two genes, then they are called xenologous. Intriguingly, there are practical 
sequence-based methods that allow to determine whether two genes x and y 
are orthologs or not with acceptable accuracy without constructing either gene 
or species trees EHHj- 
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Now, assume we have given an estimate of genes being orthologs, paralogs 
or even xenologs, that is a map d : X x X ^ {speciation, duplication, HGT}. 
Then, one is interested in the representation of these estimates as a tree T 
with event-labeling t so-that t{\ca{x,y)) = speciation iff x,y are orthologs, 
t{lca{x,y)) = duplication iff x,y are paralogs and t{\ca{x,y)) = HGT iff x,y 
are xenologs. In practice, however, such maps d are often only estimates of 
the true evolutionary relationship 6 between the investigated genes. Thus, in 
general such estimates d will not be a symbolic ultrametric. Hence, there is a 
big interest in optimally editing d to a symbolic ultrametric 6. 

The problem of editing a given symmetric map d : X x X ^ M® to a 
symbolic ultrametric is defined as follows: 


Problem Symbolic Ultrametric Editing 

Input: Given a symmetric map d : X x X ^ M®, s.t. 

d{x, y) = © if and only if a; = y. 

Question: Is there a symbolic ultrametric 6 : X x X ^ M®, s.t. for 
D = {{x,y) € X X X \ d{x,y) ^ S{x,y)} we have \D\ < k. 

A further problem arising from the latter considerations is as follows. As¬ 
sume we have an assignment of a symmetric subset R of X x X so that for all 
(a;, y) € R the assignment d{x, y) is believed to be an reliable estimate and thus, 
which is not allowed to be changed. Moreover, let AT x \ i? be the pairs (x, y) 
for which an assignment d{x,y) is not known. Assume that M = n} 

and M® = MU {©, 0}, then we can extend the map d: X x X ^ M® so that 


d(x,y) = < 


0 

d{x,y) 

0 

V. 


if X = y 

if (x,y) G R 

if {x,y)eXxX\R 


We then ask to change the assignment of a minimum number of pairs (x, y) 
with d{x, y) = 0 to some element in m G M, to 7 ^ 0 so that the resulting map 
is a symbolic ultrametric. In other words, only non-reliable estimates of pairs 
(x, y) are allowed to be changed. 


Problem Symbolic Ultrametric Completion 

Input: Given a symmetric map d : X x X ^ M®, s.t. 

d(x, y) = 0 if and only if x = y. 

Question: Is there a symbolic ultrametric 6 : X x X ^ M® s.t. 

if d{x,y) 7 ^ 0, then S{x,y) = d{x,y); and \D\ < k, where 
D = {{.x,y) & X X X \ d{x,y) 7 ^ (5(x,y)}. 

Conversely, one might ask to change a minimum number of assignments 
d(x, y) 7 ^ 0 to 6{x, y) = 0 . 


Problem Symbolic Ultrametric Deletion 

Input: Given a symmetric map d : X x X ^ M®, s.t. 

d(x, y) = © if and only if x = y. 

Question: Is there a symbolic ultrametric S : X x X ^ M® s.t. 

S{x,y) = d{x,y) or S{x,y) = 0; and \D\ < k, where 
D = {{.x,y) & X X X \ d{x,y) ^ d(x,y)}. 
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4.1 Computational Complexity 


In this section, we prove the NP-completeness of Symbolic Ultrametric 
Editing, Symbolic Ultrametric Completion and Symbolic Ultra¬ 
metric Deletion. 


Theorem 5 Symbolic Ultrametric Editing is NP-complete. 


Proof Given a symmetric map 6 it can be verified in polynomial time, if 6 is 
a symbolic ultrametric: One can check Conditions (U2) and (U3) individually 
for each of the 0(|Xp) many combinations of {x, y, z} € ('^) for (U2), and the 
0{\X\^) many combinations of {x,y,u,v} G (^) for (U3), respectively. Hence, 
Symbolic Ultrametric Editing G NP. We will show by reduction from 
Cograph Editing that Symbolic Ultrametric Editing is NP-hard. 

Let G = (y, E) be an arbitrary simple graph. We associate with G a map 
d: U X U —>■ M®, where M = {0,1,..., n} is a non-empty finite set s.t. n > 1 
and thus, 0,1 G M. Let M® := M U {©} and set for all x, y G U: 


d(x,y) 


d{y,x) = < 


© 

1 

0 


if X = y 
if [x, y] G E 
if [x, y\iE 


Obviously, d can be constructed in polynomial time. In the following, we show, 
that given an integer k, there exists a solution of the Cograph Editing prob¬ 
lem for G and integer k if and only if there exists a solution of the Symbolic 
Ultrametric Editing problem for d and integer 2k. 

First, we show that a solution of the Symbolic Ultrametric Editing 
problem for d and 2k can be constructed from a solution of the Cograph 
Editing problem for G and k. Let G' = (V, E') be a cograph with \EAE'\ < k. 
Furthermore let d: U xV ^ M® be a map, such that for all x,y £V, 


{ © if x = y 
1 if [x, y] G E' 

0 if [x, y] ^ E' 

It is easy to verify that d is a symbolic ultrametric by application of Theorem 
[21 It remains to show that for D = {(x,?/) £ X x X \ d{x,y) ^ (5(x,?/)} it 
holds that |II| < 2k. Note that for all x G U we have d{x,x) = S{x,x) = © 
and therefore (x,x) ^ D. The set D can be partitioned into the two subsets 

Di = {(x, y) I d(x, y) = I A S{x, y) = 0} and 
D 2 = {(a:, y) | d(x, y) = 0 A S{x, y) = I}. 

Hence, (x, y) G Di if and only if [x, y] G E \ E', and (x, y) G if and only if 
[x, y] £ E'\ E. As (E\ E') U (E' \ E) = {EAE') it holds that, (x, y) G E if and 
only if [x,y] G EAE'. As d and S are symmetric, it also holds that (x,y) G D 
if and only if (y,x) G D. Hence, [x,y] G EAE' if and only if (x,y) G D 
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and (y,x) G D. This reflects the fact, that an edge edit [x,y\ G EAE' in 
G corresponds to the two symmetric edits {x,y), {y,x) G D in d. Therefore, 
\D\ = \{{x,y) I dix,y) ^ Six,y)}\ = 2\EAE'\ < 2k. 

We continue to show that a solution of the Cograph Editing problem for 
G and k can be constructed from a solution of the Symbolic Ultrametric 
Editing problem for d and 2k. Let 5 : U x U —>■ M® be a symbolic ultrametric 
s.t. \D\ = \{(x,y) I d{x,y) ^ S{x,y)}\ < 2k. Furthermore, let G' = {V,E') be 
a simple graph, such that for all x,y gV it holds that [x, y] G E' if and only 
if 6 {x,y) = 1. By Theorem [2] (US’) we have that G' = Gi and hence, G' is a 
cograph. It remains to show that \EAE'\ < k. By construction, for all x G V, 
d{x,x) = 5(x,x) = 0 and [x,x\ ^ EAE'. Let D = {(x, y) | d{x,y) ^ 5(x, y)}. 
Note that for all distinct x,y G U it holds that d{x,y) G {0,1}. Hence, D can 
be partitioned into the four subsets 

Di = {(x, y) I d(x, y) = 1 A (5(x, y) = 0}, 

D 2 = {(a;, y) I d{x, y) = 0 A d{x, y) = 1}, 

Du = {(a;, y) I d{x, y) = 1 A S{x, y) G M® \ {0,1}}, and 

Di = {(x, y) I d(x, y) = 0 A 5{x, y) G M® \ {0,1}}. 

For these subsets of D it holds that if (x, y) G I?i then [x, y] G U \ E', and if 
(x,y) G D 2 then [x,y] G E' \E. Furthermore, S{x,y) G M® \ {0,1} implies 
that [x,y] ^ E' and it follows that if (x,y) G D 3 then [x,y] G E \ E', and if 
(x,y) G D 4 then [x,y] ^ E A [x,y] ^ E'. For all remaining x,y G V, i.e., for 
which d{x,y) = 6 {x,y), it holds that [x,y] ^ E\E' and [x,y] ^ E' \E. It 

follows that [x, y] G E \ E' if and only if (x, y) G Ei U E 3 , and [x, y] G E' \ E 

if and only if (x,y) G D 2 . As before, due to the symmetry of the maps d 
and (5, two symmetric edits (x,y), (y,x) G D in d correspond to at most one 
edge edit [x,y] G EAE' in G. Finally, 2|EAE'| = 2|E \ E'\ + 2|E' \ E|) = 
|Ei U E 3 I + IE 2 I < \D\ < 2k. Hence, |EZ\E'| < k. 

Thus, Symbolic Ultrametric Editing is NP-complete. □ 

Theorem 6 Symbolic Ultrametric Completion is NP-complete. 

Proof It is shown analogously as in the proof of Theorem [5] that 

Symbolic Ultrametric Min Completion g NP. We will show by reduc¬ 
tion from Cograph Completion that Symbolic Ultrametric Comple¬ 
tion is NP-hard. 

Let G = (U, E) be an arbitrary simple graph. We associate to G a map 
d: U X U —>• M® as defined in the proof of Theorem O 

{ 0 if X = y 

1 if [x,y] G E 

0 if [x,y]^E 

Let there be a solution G' = {V,E') for the Cograph Completion prob¬ 
lem for G and /c, i.e., E C E' and \E' \ E| < fc. We show that that there is 
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a solution for the Symbolic Ultrametric Completion problem for d and 
2k. Define the map S: V x V ^ M® as in the proof of Theorem O 



Again, it is easy to verify that (5 is a symbolic ultrametric by application 
of Theorem!^ Moreover, by construction S{x,y) = d{x,y) for all x,y G V 
whenever [x, y] £ E C E' and hence, for all x,y £V with d{x, y) ^ 0. 

It remains to show that for D = {{x,y) S A x A | 0 = d{x,y) ^ 5{x,y)} 
it holds that \D\ < 2k. Note that for all a; S U we have d{x,x) = S{x,x) = 0 
and therefore {x,x) ^ D. Moreover, 

D = {{x,y) I d{x,y) =0A6{x,y) = 1}. 

Hence, {x,y), {y,x) G D if and only if [x,y] G E'\E. Therefore, \D\ = 2\E'\ < 
2 k. 

We continue to show that a solution of the Cograph Editing problem for 
G and k can be constructed from a solution of Hie Symbolic Ultrametric 
Editing problem for d and 2 k . Let 5 '.V xV ^ M® be a symbolic ultrametric 
s.t. \D\ < 2 k and S{x,y) = d{x,y) if d{x,y) ^ 0. Furthermore, let G' = (U, E') 
be a simple graph, such that for all x,y £ V it holds that [x, y] £ E' if and 
only if 6 {x,y) = 1. By Theorem (US’) we have that G' = Gi and hence, 
G' is a cograph. It remains to show that \E' \ E| < k . By construction, for 
all X £ V, d{x,x) = S{x,x) = 0 and [x,x\ ^ E'. Note that for all distinct 
x,y £V it holds for the map associated to G that d{x,y) £ {0,1}. Hence, D 
can be partitioned into 


Di = {(a:, y) I d{x, y) = 0 A 6 {x,y) = 1}, and 
D 2 = {(x, y) I d{x, y) = 0 A S{x, y) £ M® \ { 0 , 1 }}. 


Thus, if (cc, y), (y, x) £ Di, then [x, y\ £ E'\E. Therefore, 2(|E'\E|) = |I?i| < 
\D\ < 2k and thus, \E' \E\ <k. 

Hence, Symbolic Ultrametric Completion is NP-complete. □ 


Using similar arguments as in the proof of Theorem [5] we can infer the 
NP-completeness of Symbolic Ultrametric Deletion by reduction from 
Cograph Deletion. 

Theorem 7 Symbolic Ultrametric Deletion is NP-complete. 

4.2 Integer Linear Program 

We showed in m that the cograph editing problem is amenable to formu¬ 
lations as Integer Linear Program (ILP). We will extend these results here 
to solve the symbolic ultrametric editing/completion/deletion problem. Let 
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d : X X X ^ M® be an arbitrary symmetric map with M = {0,1,..., n} and 
K\x\ = (X, E = (^)) be the corresponding complete graph with edge-coloring 
s.t. each edge [x,y] S E obtains color d{x,y) = d{y,x). 

For each of the three problems and hence, a given symmetric map d we 
define for each distinct x,y G X and i G M the binary constants 0^^, with 

y = 1 if and only if d{x,y) = i. Moreover, we define the binary variables 
El.y for aW i G M and x,y G X that reflect the coloring of the edges in A"|y| of 
the final symbolic ultrametric d, i.e., E^y is set to 1 if and only if S{x, y) = i. 

In order to find the closest symbolic ultrametric d, the objective function 
is to minimize the symmetric difference of the d and 5 among all different 
symbols i G M: 

™E( E i^-Ky)Ely+ E Kya-Ely)] (3) 

ieM \(x,y)GX {x,y)^X / 

The same objective function can be used for the symbolic ultrametric comple¬ 
tion and deletion problem. 

In case of the the symbolic ultrametric completion we must ensure that 
5{x, y) = d{x, y) for all d{x, y) ^ 0. Hence we set for all x, y with d{x, y) = i ^ 
0 : 

Ely = 1 . ( 4 ) 

In case of the symbolic ultrametric deletion we must ensure that 5{x^ y) = 
d{x,y) or S{x,y) = 0 or, in other words, for all d{x,y) = i I 0 it must hold 
that either E^y = 1 or E^y = 1 Hence, we set for for all for all x,y G V: 

E°y = 1, if d{x, y) = 0, and Ely + E^y = 1, else. jl) 

For the cograph editing problem we neither need Constraint HI nor Sfl How¬ 
ever, for all three problems we need the following. 

Each tuple {x,y) with x y has exactly one value i G M assigned to it 
which is expressed in the following constraint. 

E ^ -Elx = 0 for all x,y G X. (5) 

ieM 

In order to satisfy Condition (U2’) and thus, that all induced triangles have 
at most two colors on the edges we need this constraint. 

Ely + E:>,+El,<2 ( 6 ) 

for all ordered tuples {i,j,k) of distinct i,j,k G M and pairwise distinct 
x,y,z G X. 

Finally, in order to satisfy Condition (US’) and thus, that each mono¬ 
chromatic subgraph comprising all edges with fixed color i is a cograph, we 
need the following constraint that forbids induced P 4 ’s. 
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for alH G M and all ordered tuples {x, y, u, v) of distinct x, i/,u,v € X. 

It is easy to verify that the latter ILP formulation needs 0(|M||Xp) vari¬ 
ables and 0(|Mp|Xp -f constraints. 


5 Cotree Representation and Cograph fc-Decomposition 

If a given non-edge colored graph G is not a cograph, then Theorem [3] implies 
that one cannot define an edge-distinguishing symbolic ultrametric, and thus, 
in particular no single tree representation of G. Therefore, we are interested to 
represent the topology of G in an unambiguous way with a minimum number 
of trees. 

Recollect, a graph G = (V,E) is represented by a set of cotrees T = 
{Ti,..., Tfc}, if and only if for each edge [x, y] € E there is a tree € T with 
t(lcaTi(x,y)) = 1 . 

Note, by definition, each cotree determines a subset Ei = {[x,y] & E \ 
t{\ca,Ti{x,y)) = 1} of E. Hence, the subgraph [V,Ei) of G must be a cograph. 
Therefore, in order to find the minimum number of cotrees representing a 
graph G, we can equivalently ask for a decomposition 7T = {Ei,... ^Ek} oi E 
so that each subgraph (V, Ei) is a cograph, where k is the least integer among 
all cograph decompositions of G. Thus, we are dealing with the following two 
equivalent problems. 

Problem Cotree /c-Representation 

Input: Given a graph G = (V, E) and an integer k . 

Question: Can G be represented by k cotrees? 

Problem Cograph fc-DECOMPOSmoN 

Input: Given a graph G = (V, E) and an integer k. 

Question: Is there a cograph fc-decomposition of G? 

Clearly, any cograph has an optimal 1-decomposition, while for cycles of 
length > 4 or paths P 4 there is always an optimal cograph 2-decomposition. 
However, there are examples of graphs that even do not have a cograph 2- 
decomposition, see Figure [2l Moreover, as shown in Figure [3l the number of 
different optimal cograph fc-decomposition on a graph can grow exponentially, 
the next theorem provides a non-trivial upper bound for the integer fc s.t. there 
is still a cograph fc-decomposition for arbitrary graphs. 

Theorem 8 For every graph G with maximum degree A there is a cograph k- 
decomposition with 1 < fc < Z\-|-l that can be computed in 0(|R||if|-|-Zi(|R|-|- 
|if|)) time. Hence, any graph can he represented by at most Z\ -I- 1 cotrees. 

Proof Consider a proper edge-coloring : if —>• {1,..., fc} of G, i.e., an edge 
coloring such that no two incident edges obtain the same color. Any proper 
edge-coloring using fc colors yields a cograph fc-partition iT = {Ei,...,Ek} 
where Ei = {e € E \ ip{e) = i}, because any connected component in Gi = 
{V, Ei) is an edge and thus, no P 4 ’s are contained in Gi. Vizing’s Theorem [32] 
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O 



Fig. 2 Full enumeration of all possibilities (which we leaf to the reader), shows that the 
depicted graph has no cograph 2-decomposition. The existing cograph 3-decomposition is 
also a cograph 3-partition; highlighted by dashed-lined, dotted and bold edges. 





Fig. 3 Two isomorphic graphs with two non-equivalent optimal cograph 2-decomposition 
(highlighted by dashed and solid edges) are shown in the upper part. By stepwisely identify¬ 
ing single vertices one obtains a chain of graphs G, see lower part. For each subgraph that is 
a copy of the graph above, an optimal cograph 2-decomposition can be determined almost 
independently of the remaining parts of the graph G. Hence, with an increasing number 
of vertices of such chains G the number of different cograph 2-decompositions is growing 
exponentially. 


implies that for each graph there is a proper edge-coloring using k colors with 
A<k < A + 1. 

An proper edge-coloring using at most A + l colors can be computed with 
the Misra-Gries-algorithm in 0(|l/||i?|) time [37]. Since the (at most A + l) 
respective cotrees can be constructed in linear-time 0(|y|-|-|A|) [8|,we derive 
the runtime 0{\V\\E\ + A{\V\ + |A|)). □ 

Obviously, any optimal ^-decomposition must also be a coarsest k- 
decomposition, while the converse is in general not true, see Fig. |4| The par¬ 
tition n = {El,... ,Ek} obtained from a proper edge-coloring is usually not 
a coarsest one, as possibly {V,Ej) is a cograph, where Ej = Gi^jEi and 
J C However, there are graphs having an optimal cograph A- 

decomposition, see Fig. [3] and |3| Thus, the derived bound Z\ -h 1 is almost 
sharp. Nevertheless, we assume that this bound can be sharpened: 
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Fig. 4 The shown (non-co)graph G has a 2-decomposition 77 = {Ei^E 2 }- Edges in the 
different elements Ei and E 2 are highlighted by dashed and solid edges, respectively. Thus, 
two cotrees, shown in the lower part of this picture, are sufficient to represent the structure 
of G. The two cotrees are isomorphic, and thus, differ only in the arrangement of their leaf 
sets. For this reason, we only depicted one cotree with two different leaf sets. Note, G has 
no 2-partition, but a coarsest 3-partition. The latter can easily be verified by application of 
the construction in Lemma ^ 

Conjecture 1 For every graph G with maximum degree A there is a cograph 
Z\-decomposition. 

However, there are examples of non-cographs containing many induced P^s 
that have a cograph fc-decomposition with k A-\-l, which implies that any 
optimal fc-decomposition of those graphs will have significantly less elements 
than A + 1, see the following examples. 

Example 1 Consider the graph G = (V, E) with vertex set V = {1,..., fc} U 
{a, 6} and E = | i.j e /c},i ^ j} U {[k,a\,[a,b]}. The graph 

G is not a cograph, since there are induced P^’s of the form i — k — a — b, 
i G {1,..., /c — 1}. On the other hand, the subgraph H = {V,E\ {[k, a]}) has 
two connected components, one is isomorphic to the complete graph Kk on 
k vertices and the other to the complete graph K 2 . Hence, is a cograph. 
Therefore, G has a cograph 2-partition {E \ {[fc, a]}, {[fc, a]}}, independent 
from k and thus, independent from the maximum degree A = k. 

Example 2 Consider the 2n-dimensional hypercube Q 2 n = (V, E) with max¬ 
imum degree 2n. We will show that this hypercube has a coarsest cograph 
n-partition 77 = {7?i,..., if„}, which implies that for any optimal cograph 
fc-decomposition of Q 2 n we have k < A/2. 
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We construct now a cograph n-partition of Q 2 n- Note, Q 2 n = = 

^'i=i{K 2 ^K 2 ) = In order to avoid ambiguity, we write □”^^<52 as 

Hi ~ Q 2 and assume that Q 2 has edges [0,1], [1,2], [2,3], [3,0]. 
The cograph n-partition of Q 2 n is defined as H = {Ei,, En}, where Ei = 
Uv^vE{H^). In other words, the edge set of all i^i-layers in Q 2 n constitute a 
single class Ei in the partition for each i. Therefore, the subgraph G = (F, Ei) 
consists of n connected components, each component is isomorphic to the 
square Q 2 . Hence, Gi = {V,Ei) is a cograph. 

Assume for contradiction that H = {Ai,..., is not a coarsest par¬ 
tition. Then there are distinct classes Ei, i G / C n} such that 

Gi = (F, Uig/Ai) is a cograph. W.l.o.g. assume that 1,2 G / and let 
V = (0,..., 0) G F. Then, the subgraph U C Q 2 n contains a path 
Pi with edges [x,v] G E{Hl) and [w, a],[a, 6] G E^H^), where x=(l,0,... ,0), 
a=(0,l,0... ,0) and b = (0, 2, 0 ..., 0). By definition of the Cartesian product, 
there are no edges connecting x with a ot b ot v with b in Q 2 n and thus, this 
path P 4 is induced. As this holds for all subgraphs iJ" UiJJ {i,j G I distinct) 
and thus, in particular for the graph Gj we can conclude that classes of H 
cannot be combined. Hence 77 is a coarsest cograph n-partition. 

Because of the results of computer-aided search for n — 1-partitions and 
decompositions of hypercubes Q 2 n we are led to the following conjecture: 

Conjecture 2 Let fc G N and k > 1. Then the 2fc-cube has no cograph k — 1- 
decomposition, i.e., the proposed fc-partition of the hypercube Q 2 k in Example 
[ 2 ] is also optimal. 

The proof of the latter hypothesis would immediately verify the next con¬ 
jecture. 

Conjecture 3 For every fc G N there is a graph that has an optimal cograph 
fc-decomposition. 

Proving the last conjecture appears to be difficult. We wish to point out 
that there is a close relationship to the problem of finding pattern avoiding 
words, see e.g. [SJ[71[Sni[Mlll[2]: Consider a graph G = (F, E) and an ordered 
list (ei,..., Cm) of the edges Ci G E. We can associate to this list (ei,..., e™) 
a word w = (wi,..., Wm). By way of example, assume that we want to find a 
valid cograph 2-decomposition {Ei,E 2 } of G and that G contains an induced 
Pi consisting of the edges e^, e^, Cfc. Hence, one has to avoid assignments of the 
edges Ci, ej,ek to the single set Ei, resp., E 2 . The latter is equivalent to find a 
binary word {wi,... ,Wm) such that {wi,Wj,Wk) ^ {X,X,X), X G {0,1} for 
each of those induced PiS. The latter can easily be generalized to find pattern 
avoiding words over an alphabet {l,...,/c} to get a valid fc-decomposition. 
However, to the authors knowledge, results concerning the counting of fc-ary 
words, avoiding forbidden patterns and thus, verifying if there is any such word 
(or equivalently a fc-decomposition) are basically known for scenarios like: If 
(pi ,.. .pi) G {1,..., ky (often I < 3), then none of the words w that contain 
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a subword (wij,... ,Wi,) = {pi,.. .pi) with ij+i = ij + 1 (consecutive letter 
positions) or ij < ik whenever j < k (order-isomorphic letter positions) is 
allowed. However, such hndings are to restrictive to our problem, since we are 
looking for words, that have only on a few, but fixed positions of non-allowed 
patterns. Nevertheless, we assume that results concerning the recognition of 
pattern avoiding words might offer an avenue to solve the latter conjectures. 


5.1 Computational Complexity 

In the following, we will prove the NP-completeness of Cotree 2- 
Representation and Cotree 2-Decomposition. Additionally, these re¬ 
sults allow to show that the problem of determining whether there is cograph 
2-partition is NP-complete, as well. 

We start with two lemmata concerning cograph 2-decompositions of the 
graphs shown in Fig. and [ 6 ] 

Lemma 1 For the literal and extended literal graph in Figure\^ every cograph 
2 -decomposition is a uniguely determined cograph 2-partition. 

In particular, in every cograph 2-partition {Ei,E 2 } of the extended literal 
graph, the edges of the triangle ( 0 , 1 , 2 ) must be entirely contained in one Ei 
and the pending edge [ 6 , 9] must be in the same edge set Ei as the edges of the 
of the triangle. Furthermore, the edges [9,10] and [9,11] must be contained in 
^3 J J ■ 

Proof It is easy to verify that the given cograph 2-partition {Ei, E 2 } in Fig. [5] 
fulfills the conditions and is correct, since Gi = {V,Ei) and G 2 = {V,E 2 ) do 
not contain induced P 4 S and are, thus, cographs. We have to show that it is 
also unique. 

Assume that there is another cograph 2-decomposition {Fi, ^’ 2 }- Note, for 
any cograph 2 -decomposition {Fi, F 2 } it must hold that two incident edges in 
the triangle (0,1, 2) are contained in one of the sets Fi or F 2 . W.l.o.g. assume 
that [ 0 , 1 ], [ 0 , 2 ] e Fi. 

Assume first that [1,2] ^ Fi. In this case, because of the paths P 4 = 
6 — 2 — 0 — I and F 4 = 2 — 0 — 1 — 5 it must hold that [2,6], [I, 5] ^ Fi and thus, 
[2,6], [I, 5] G F 2 . However, in this case and due to the paths F 4 = 6 — 2 — 1 — 4 
and 2 — 0 — I — 4 the edge [1,4] can neither be contained in Fi nor in F 2 , a 
contradiction. Hence, [1,2] G Fi. 

Note, the square 51256 induced by vertices 1,2, 5, 6 cannot have all edges 
in Fi, as otherwise the subgraph (H, Fi) would contain the induced F 4 = 
6 — 5 — 1 — 0. Assume that [1,5] G Fi. As not all edges 5i256 are contained in 
Fi, at least one of the edges [5,6] and [2,6] must be contained in F 2 . If only 
one of the edges [5,6], resp., [2,6] is contained in F 2 , we immediately obtain 
the induced F 4 = 6 — 2 — 1 — 5, resp., 6 — 5 — I — 2 in (H,Fi) and therefore, 
both edges [5,6] and [2,6] must be contained in F 2 . But then the edge [2,7] 
can neither be contained in Fi (due to the induced F 4 = 5 — I — 2 — 7) nor in 
F 2 (due to the induced F 4 = 5 — 6 — 2 — 7), a contradiction. Hence, [I, 5] ^ Fi 
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Fig. 5 Left the literal graph and right the extended literal graph with unique corresponding 
cograph 2 -partition (indicated by dashed and bold-lined edges) is shown. 



Fig. 6 Shown is a clause gadget which consists of a triangle (a, 6 , c) and three extended 
literal graphs (as shown in Fig. 0 with edges attached to (a, 6 , c). A corresponding cograph 
2 -partition is indicated by dashed and bold-line edges. 


and thus, [1,5] € F 2 for any 2-decomposition. By analogous arguments and 
due to symmetry, all edges [0,3], [0,8], [1,4], [2,6], [2,7] are contained in F 2 , 
but not in Fi. 

Moreover, due to the induced P 4 = 7 — 2 — 6 — 5 and since [2, 6 ], [2, 7] S F 2 , 
the edge [5, 6 ] must be in Fi and not in Pj- By analogous arguments and due 
to symmetry, it holds that [3,4], [7, 8 ] G Fi and [3,4], [7, 8 ] ^ ^ 2 - Finally, none 
of the edges of the triangle ( 0 , 1 , 2 ) can be contained in P 2 , as otherwise, we 
obtain an induced P 4 in {V,F 2 ). Taken together, any 2-decomposition of the 
literal graph must be a partition and is unique. 

Consider now the extended literal graph in Figure[SJ As this graph contains 
the literal graph as induced subgraph, the unique 2 -partition of the underlying 
literal graph is determined as by the preceding construction. Due to the path 
P 4 = 7 — 2 — 6 — 9 with [2, 6 ], [2, 7] G P 2 we can conclude that [ 6 ,9] ^ P 2 and 
thus [ 6 , 9] G Pi. Since there are induced paths P 4 = 5—6—9—?/, y = 10,11 with 
[5, 6 ], [ 6 ,9] G Pi we obtain that [9,10], [9,11] ^ Pi and thus, [9,10], [9,11] G P 2 
for any 2 -decomposition (which is in fact a 2 -partition) of the extended literal 
graph, as claimed. □ 

Lemma 2 Given the clause gadget in Fig. O 
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For any cograph 2-decomposition, all edges of exactly two of the triangles 
in the underlying three extended literal graphs must be contained in one Ei and 
not in Ej, while the edges of the triangle of one extended literal graph must be 
in Ej and not in Ei, i ^ j■ 

Furthermore, for each cograph 2-decomposition exactly two of the edges e, e' 
of the triangle (a, b, c) must be in one Ei while the other edge f is in Ej but not 
in Ei, j ^ i. The cograph 2-decomposition can he chosen so that in addition 
e, e' ^ Ej, resulting in a cograph 2-partition of the clause gadget. 

Proof It is easy to verify that the given cograph 2-partition in Fig. [5] fulfills 
the conditions and is correct, as Gi = {V,Ei) and G 2 = (y, £' 2 ) are cographs. 

As the clause gadget contains the literal graph as induced subgraph, the 
unique 2-partition of the underlying literal graph is determined as by the con¬ 
struction given in Lenima[TJ Thus, each edge of the triangle in each underlying 
literal graph is contained in either one of the sets Ei or £2 • Assume that edges 
of the triangles in the three literal gadgets are all contained in the same set, 
say El. Then, Lemma [T] implies that [9, a], [9, c], [9', a], [9', b], [9", b], [9", c] S £2 
and none of them is contained in £ 1 . Since there are induced P 4 ’s: 9 — a — b—9", 
9' — a — c — 9” and 9 — c — b — 9', the edges [a, 6], [a,c], [6, c] cannot be contained 
in £ 2 , and thus must be in £ 1 . However, this is not possible, since then we 
would have the induced paths £4 = 9 — a — 9' — 6 in the subgraph (H, £ 2 ) 
a contradiction. Thus, the edges of the triangle of exactly one literal gadget 
must be contained in a different set Ei than the edges of the other triangles in 
the other two literal gadgets. W.l.o.g. assume that the 2-decomposition of the 
underlying literal gadgets is given as in Fig. [51 and identify bold-lined edges 
with £1 and dashed edges with £ 2 . 

It remains to show that this 2-decomposition of the underlying three literal 
gadgets determines which of the edges of triangle (a, b, c) are contained in 
which of the sets £1 and £ 2 . Due to the induced path 9 — a — b — 9” and 
since [9, a], [9", 6 ] S £ 2 , the edge [a,b] cannot be contained in £2 and thus, 
is contained in £ 4 . Moreover, if [ 6 ,c] ^ £ 2 , then there is an induced path 
£4 = 6 — 9" — c — 9 in the subgraph {V, £ 2 ), a contradiction. Hence, [ 6 , c] S £2 
and by analogous arguments, [a,c] € £ 2 . If [b,c\ ^ £1 and [a,c] ^ £ 1 , then 
we obtain a cograph 2-partition. However, it can easily be verified that there 
is still a degree of freedom and [a,c], [ 6 ,c] G £1 is allowed for a valid cograph 
2 -decomposition. □ 

We are now in the position to prove the NP-completeness of Cotree 
2-Representation and Cotree 2-Decomposition by reduction from the 
following problem. 

Problem Monotone NAE 3-SAT 
Input: Given a set U of Boolean variables and a set of clauses 

if = {Cl, ..., Cm} over U such that for alH = 1,..., m 
it holds that \Ci\ =3 and Ci contains no negated variables. 
Question: Is there a truth assignment to if such that in each Ci 
not all three literals are set to true? 



22 


Marc Hellmuth, Nicolas Wieseke 



xi a ?2 X3 X4 X5 Xq 


Fig. 7 Shown is the graph ^ as constructed in the proof of Theorem 1101 In particular, 
^ reflects the NAE 3-SAT formula '0 = {C*!, 6 * 2 , 03 } with clauses Ci = {xi,X 4 ,X 2 ),C 2 = 
{x 2 ,xs,X 4 ) and 6*3 = {x 4 ,X 5 ,xq). Different literals obtain the same truth assignment true or 
false, whenever the edges of the triangle in their corresponding literal gadget are contained 
in the same set Ei of the cograph 2 -partition, highlighted by dashed and bold-lined edges. 


Theorem 9 (^ |31LI28p Monotone NAE 3-SAT is NP-complete. 

Theorem 10 Cograph 2-Decomposition, and thus, Cotree 2- 
Representation is NP-complete. 

Proof Given a graph G = {V,E) and cograph 2-decomposition {Ei,E 2 }, one 
can verify in linear time whether {V,Ei) is a cograph [5]. Hence, Cograph 
2-Partition e NP. 

We will show by reduction from Monotone NAE 3-SAT that Cograph 
2-Decomposition is NP-hard. Let = {Ci,... ,Cm) be an arbitrary in¬ 
stance of Monotone NAE 3-SAT. Each clause Ci is identified with a triangle 
{ai,bi,Ci). Each variable Xj is identified with a literal graph as shown in Fig. 
[S] (left) and different variables are identified with different literal graphs. Let 
Ci = (xij, Xij, Xij) and Gi^ and Gig the respective literal graphs. Then, 
we extend each literal graph Gi^ by adding an edge [6,9ij]. Moreover, we add 
to Gi, the edges [9i,i, a,], [9i,i, q], to Gi^ the edges [9i,2, Oi], [9i,2, &i], to Gig 
the edges [9i_3, Ci], [9i_3, bi]. The latter construction connects each literal graph 
with the triangle {ai,bi,Ci) of the respective clause Gi in a unique way, see 
Fig. [HI We denote the clause gadgets by Ei for each clause Gi. We repeat this 
construction for all clauses Gi of "0 resulting in the graph E. An illustrative 
example is given in Fig. [T] Clearly, this reduction can be done in polynomial 
time in the number m of clauses. 

We will show in the following that E has a cograph 2-decomposition (resp., 
a cograph 2-partition) if and only if 0 has a truth assignment /. 

Let '0 = (Gi,..., Cm) have a truth assignment. Then in each clause Ci at 
least one of the literals , Xi^ , Xig is set to true and one to false. We assign 
all edges e of the triangle in the corresponding literal graph Gi^ to Ei, if 
f{xi.) = true and to E 2 , otherwise. Hence, each edge of exactly two of the 
triangles (one in Gq. and one in Gq.,) are contained in one Ey and not in Eg , 
while the edges of the other triangle in Gi.,,, j" j,j' are contained in Eg 
and not in Er, r ^ s, as needed for a possible valid cograph 2-decomposition 
(Lemma[2). We now apply the construction of a valid cograph 2-decomposition 
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(or cograph 2-partition) for each •f'i as given in Lemma [2J starting with the 
just created assignment of edges contained in the triangles in Gi^ , Gi^, and 
Gi^„ to El or i? 2 - In this way, we obtain a valid cograph 2-decomposition (or 
cograph 2-partition) for each subgraph Ei of E. Thus, if there would be an 
induced P 4 in W with all edges belonging to the same set Er^ then this P 4 can 
only have edges belonging to different clause gadgets 'f'fc, Ei. By construction, 
such a P 4 can only exist along different clause gadgets Ek and Ei if Ck and C; 
have a literal Xi = Xk^ = ^in common. In this case. Lemma [2] implies that 
the edges [ 6 , 9k,m] and [ 6 , 9i,n] in •f'i must belong to the same set E^. Again by 
Lemma H the edges [9k,m,y\ and [9k,m,y']-, V,v' S {ak,hk,Ck} as well as the 
edges [9i,m y] and [9/,n, y'], y, y' S {a;, must be in a different set Eg than 
[6,9fe,m] and [ 6 , 9;_„]. Moreover, respective edges [5,6] in Ek^ as well as in Wi 
(Fig.|5|) must be in Er, i.e., in the same set as [6,9fe_m] and [ 6 , 9;_„]. However, 
in none of the cases it is possible to find an induced P 4 with all edges in the 
same set Ej. or Eg along different clause gadgets. Hence, we obtain a valid 
cograph 2 -decomposition, resp., cograph 2 -partition of E. 

Now assume that E has a valid cograph 2-decomposition (or a cograph 
2-partition). Any variable Xi^ contained in some clause Ci = {xi-^,Xi^,Xi^) is 
identified with a literal graph Gij. Each clause Gi is, by construction, identified 
with exactly three literal graphs Gi^ , Gi^ , Gi^ , resulting in the clause gadget <Ei. 
Each literal graph Gi^ contains exactly one triangle tj. Since Ei is an induced 
subgraph of E, we can apply Lemma [2] and conclude that for any cograph 
2 -decomposition (resp., cograph 2 -partition) all edges of exactly two of three 
triangles are contained in one set E^, but not in Eg, and all edges of 

the other triangle are contained in Eg, but not in Er, s ^ r. Based on these 
triangles we define a truth assignment / to the corresponding literals: w.l.o.g. 
we set f{xi) =true if the edge e G U is contained in Ei and f{xi) = false 
otherwise. By the latter arguments and Lemma|21 we can conclude that, given 
a valid cograph 2 -partitioning, the so defined truth assignment / is a valid 
truth assignment of the Boolean formula f), since no three different literals in 
one clause obtain the same assignment and at least one of the variables is set 
to true. Thus, COGRAPH 2-Decomposition is NP-complete 

Finally, because Cograph 2-Decomposition and Cotree 2- 
Representation are equivalent problems, the NP-completeness of Cotree 
2-Representation follows. □ 

As the proof of Theorem |T0| allows us to use cograph 2-partitions in all 
proof steps, instead of cograph 2 -decompositions, we can immediately infer 
the NP-completeness of the following problem for k = 2, as well. 

Problem Cograph fc-PARTiTiON 

Input: Given a graph G = (V, E) and an integer k. 

Question: Is there a Cograph fc-Partition of G? 

Theorem 11 Cograph 2-Partition is NP-complete. 

As a direct consequence of the latter results, we obtain the following result. 
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Corollary 2 Let G be a given graph that is not a eograph. The three opti¬ 
mization problems to find the least integer k > I so that there is a Cograph 
k-Partition, a Cograph k-Decomposition, or a Cotree k-Representation for the 
graph G, are NP-hard. 


5.2 Integer Linear Program 

Let G = {V, E) be a given graph with maximum degree A. We want to find 
a cograph-fc-decomposition, resp., partition 7T = {Ei, ..., Ek} for the least 
integer k. Theorem [5] implies that the least integer k is always less or equal to 
Zl + l. 

We define binary variables Efy for all x,y G V and 1 < i < Z\ + 1 s.t. 
Ely = 1 if and only if the edge [x,y] € E is contained in class Ei of 11. 
Moreover, we define the binary variables with l<z<Z\ + lso that 
M* = 1 if and only if the class Ei € 11 is non-empty in our construction. In 
other words, X]i<z<zi-i-i cardinality of U. 

In order to find the cograph decomposition, resp., partition 77 of G having 
the fewest number of elements we need the following objective function. 

min M* (8) 

l<i<A+l 

If we want to find a cograph-decomposition and hence, that each edge is 
contained in at least one class Ei of 77 we need the next constraint. 

Y for all [x, y] G E. (9) 

l<i<A+l 

In contrast, if we want to find a cograph-partition and hence, that each 
edge is contained in exactly one class Ei of 77 we need this constraint. 

Y = 1 for all [x, y] G E. 0) 

l<i<A+l 

Moreover, we must ensure that non-edges [x, y] ^ E are not contained in 
any class of 77 which is done with the next constraint. 

Y ^xy = 0 for all [x, y] i E. (10) 

l<i<A+l 

Whenever there is a class Ei containing an edge [x,y] G E and hence, if 
El,, = 1 then we must set M* = 1. 

d. y 

Y Ky < for all 1 < 7 < zi -b 1. 

x,y^V 


( 11 ) 
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Finally we have to ensure that each subgraph Gi = {V,Ei) is a cograph, 
and thus, does not contain induced P 4 ’s, which is achieved with the following 
constraint. 

El,y + El^ + Kv-Ku-K.-Elv<‘2 (12) 

for all 1 < i < Z\ +1 and all ordered tuples (x, y, u, v) of distinct x, y,u,v € V. 

This ILP-formulation needs 0{A\V\'^) variables and 0{\E\ + A + |F|^) 
constraints. 
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